The Index-Based Subgraph Matching Algorithm (ISMA): Fast Subgraph Enumeration in Large Networks Using Optimized Search Trees

نویسندگان

  • Sofie Demeyer
  • Tom Michoel
  • Jan Fostier
  • Pieter Audenaert
  • Mario Pickavet
  • Piet Demeester
چکیده

Subgraph matching algorithms are designed to find all instances of predefined subgraphs in a large graph or network and play an important role in the discovery and analysis of so-called network motifs, subgraph patterns which occur more often than expected by chance. We present the index-based subgraph matching algorithm (ISMA), a novel tree-based algorithm. ISMA realizes a speedup compared to existing algorithms by carefully selecting the order in which the nodes of a query subgraph are investigated. In order to achieve this, we developed a number of data structures and maximally exploited symmetry characteristics of the subgraph. We compared ISMA to a naive recursive tree-based algorithm and to a number of well-known subgraph matching algorithms. Our algorithm outperforms the other algorithms, especially on large networks and with large query subgraphs. An implementation of ISMA in Java is freely available at http://sourceforge.net/projects/isma/.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Index-Based Subgraph Matching Algorithm with General Symmetries (ISMAGS): Exploiting Symmetry for Faster Subgraph Enumeration

Subgraph matching algorithms are used to find and enumerate specific interconnection structures in networks. By enumerating these specific structures/subgraphs, the fundamental properties of the network can be derived. More specifically in biological networks, subgraph matching algorithms are used to discover network motifs, specific patterns occurring more often than expected by chance. Findin...

متن کامل

NOVA: A Novel and Efficient Framework for Finding Subgraph Isomorphism Mappings in Large Graphs

Considerable efforts have been spent in studying subgraph problem. Traditional subgraph containment query is to retrieve all database graphs which contain the query graph g. A variation to that is to find all occurrences of a particular pattern(the query) in a large database graph. We call it subgraph matching problem. The state of art solution to this problem is GADDI. In this paper, we will p...

متن کامل

GiS: Fast Indexing and Querying of Graph Structures

We propose a new way of indexing a large database of graphs and processing exact subgraph matching (or subgraph isomorphism) and approximate (full) graph matching queries. Rather that decomposing a graph into smaller units (e.g., paths, trees, graphs) for indexing purposes, we represent each graph in the database by its graph signature, which is essentially a multiset, and each signature is the...

متن کامل

SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs

With the emergence of new applications, e.g., computational biology, new software engineering techniques, social networks, etc., more data is in the form of graphs. Locating occurrences of a query graph in a large database graph is an important research topic. Due to the existence of noise (e.g., missing edges) in the large database graph, we investigate the problem of approximate subgraph inde...

متن کامل

NeMa: Fast Graph Search with Label Similarity

It is increasingly common to find real-life data represented as networks of labeled, heterogeneous entities. To query these networks, one often needs to identify the matches of a given query graph in a (typically large) network modeled as a target graph. Due to noise and the lack of fixed schema in the target graph, the query graph can substantially differ from its matches in the target graph i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2013